Improving speaker identification performance in reverberant conditions using lip information
نویسندگان
چکیده
This paper considers the improvment of speaker identification performance in reverberant conditions using additional lip information. Automatic speaker identification (ASI) using speech characteristics alone can be highly successful, however problems occur with mis-matches between training and testing conditions. In particular, we find that ASI performance drops dramatically when given anechoic training but reverberant test speech. Previous work [1][2] has shown that speaker dependant information can be extracted from the static and dynamic qualities of moving lips. Given that lip information is uneffected by reverberation, we choose to fuse this additional information with speech data. We propose a new method for estimating confidence levels to allow adaptive fusion of the audio and visual data. Identification results are presented for increasing levels of artificially reverberated data, where lip information is shown to provide excellent ASI peformance improvement.
منابع مشابه
Discrimination Analysis of Lip Motion Features for Multimodal Speaker Identification and Speech-reading
In this thesis a new multimodal speaker/speech recognition system that integrates audio, lip texture, lip geometry, and lip motion modalities is presented. There have been several studies that jointly use audio, lip intensity and/or lip geometry information for speaker identification and speech-reading applications. This work proposes using explicit lip motion information, instead of or in addi...
متن کاملDeep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a comb...
متن کاملImproving robustness to compressed speech in speaker recognition
The goal of this paper is to analyze the impact of codecdegraded speech on a state-of-the-art speaker recognition system and propose mitigation techniques. Several acoustic features are analyzed, including the standard Mel filterbank cepstral coefficients (MFCC), as well as the noise-robust medium duration modulation cepstrum (MDMC) and power normalized cepstral coefficients (PNCC), to determin...
متن کاملImproving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing
We present an improved method for training Deep Neural Networks for dereverberation and show that it can improve performance for the speech processing tasks of speaker verification and speech enhancement. We replicate recently proposed methods for dereverberation using Deep Neural Networks and present our improved method, highlighting important aspects that influence performance. We then experi...
متن کاملPerformance Analysis of Robust Method to Identify the Speaker Using Lip Segmentation
This document addresses the problem of providing security to vehicles based on a unique biometric feature that is lip motions. This work proposes the use of explicit lip motion features for speaker identification so that the car can be unlocked depending on the identification process results. For identification process, lip boundaries are tracked over the images and compared to the database. Fo...
متن کامل